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Abstract. The operational characteristics of a linear neural network image 
processing system based on the brain's vision system are investigated. The final 
stage of the network consists of edge detectors of various orienations arranged in 
a feature map, corresponding to the primary visual cortex, or VI. The lateral 
geniculate nucleus is modeled as a preprocessing stage. Excitatory forward and 
inhibitory backward connections exist between the LGN and VI. By a method of 
reconstructing the input images in terms of VI activities, the simulations show that 
images can be faithfully represented in VI by the proposed network. The signal- 
to-noise ratio of the image is improved by the representation, and compression 
ratios of well over two-hundred are possible. Lateral interacations between VI 
neurons sharpen their orientational tuning. We further study the dynamics of the 
processing, showing that the rate of decrease of the error of the representation is 
maximized for the receptive fields used, and we develop a Fokker-Planck equation 
for a more detailed prediction of the error value vs. time. Finally, we show how 
the eigenvalues of the covariance matrix of the inputs can be employed to predict 
the rate of error decrease. 

1 Introduction 

The companion paper to this one may be found at http://arxiv.org/abs/q- 
bio.NC/0505011. 

In that work we described a neural network which developed oriented 
edge detectors and a feature map very closely resembling those found in the 
primate visual system. We postulated that inhibitory backward connections 
from VI to the LGN play an important role in the transfer of information 
from LGN to VI, and showed that the rate of information transfer from the 
LGN to VI is maximized with the receptive fields developed. 

Here we use the mature network from that research to investigate the 
dynamics of image processing in the visual system. The present work is 
divided into two parts. 

The first set of analyses focuses on the properties of the representation 
obtained by the network. We will show that natural images can be faith- 
fully represented by the collection of edge detectors developed under hebbian 
learning rules, that the signal to noise ratio of the information is improved 
by the feedback mechanism described, and that a sharpening of orientation 
tuning of VI neurons results from the lateral interactions between VI neu- 
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rons. On the way, we explore the compression possibilities of the algorithms 
used in the simulations. 

The second half of the investigation is a study of the actual dynamics 
of the image processing done by the network. It is demonstrated that the 
signals in both the LGN and VI are transient as expected, and time series of 
the neural activities are compared to experimental results. We also show how 
that network behavior can be predicted from a knowledge of the eigenvalues 
of the covariance matrix of the inputs. 

1.1 Non-orthogonal Representations 

If the receptive fields of cortical neurons can detect all or most of the essential 
features of an image, then it should be possible to reconstruct the image from 
the responses of the feature detectors. Such a reconstruction is useful for 
investigating various properties of the encoding of an image into VI activities. 
A natural choice for the reconstruction is to use the receptive fields of the 
VI neurons as a set of basis vectors within the space of possible images. 

Each feature detecting neuron in VI has only a limited range of image 
elements that it can respond to. Therefore, each feature detector can encode 
only a very small amount of information about the visual scene. The simplic- 
ity of the receptive fields is compensated for in the brain by the use of large 
numbers of neurons. The processing system must be able to combine these 
many simple elements into a coherent, detailed interpretation of the image. 
Because the receptive fields are of a simple form and each has a relatively 
large extent in the visual field, there must be a considerable overlap between 
them to cover the image features that must be represented. Taken as a set 
of basis elements for the representation of the image, the receptive fields of 
VI neurons thus form a non-orthogonal basis. 

If the set of basis elements is orthonormal, the coefficients of the repre- 
sentation are simply the inner products of the image with the basis vectors. 
When the basis elements are non-orthogonal, there are various ways of cal- 
culating the coefficients, typically towards the end of minimizing the squared 
error between the representation and the input image. Numerous works have 
been published concerning the Gabor scheme of image representation, which 
also utilizes a set of non-orthogonal coefficients, and these works are rele- 
vant to the present investigation. In the Gabor scheme, the basis elements 
are trigonometric functions along particular directions in space, masked by 
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gaussian functions. Some authors have proposed dual functions to find Ga- 
bor coefficients by direct integration [1, 2]. Others employ neural networks 
which arrive at the coefficients of the representation of an image. [3, 4, 5]. 
Some work [6, 7, 8, 9] has also been done on image representation using fea- 
ture detectors similar to the ones described here, but no previous work, to 
our knowledge, has utilized feature detectors arranged in a feature map such 
as is found in the visual cortex. 

2 The Computational Method 

The network is exactly the same as that of the previous chapter, only with 
mature weight values, and the updating of weight values is not included. 
For reference purposes, the dynamical equations are described again in the 
following. 

2.1 Algorithms 

The algorithms for the dynamics of the network areas follows. The variables 
Xi, i/j, and a k are the input a photoreceptor at position i, and activities of 
a retinal ganglion cell at position j, and a VI neuron at position k, in their 
respective neural layers. 

Input images are convolved by a difference-of-gaussians function repre- 
senting the receptive fields of the center-surround retinal ganglion cells. We 
put 

Vi = Y.9ijXi (1) 

i 

where is the difference-of-gaussians function. The yj are the initial activ- 
ities in the LGN layer. There are feedforward connections from the LGN to 
VI, and lateral connections between VI neurons. The activity of neurons in 
layer VI in terms of the activities in the LGN layer is given by: 

a k = J2 w ik Vj ( 2 ) 

3,k 

where Wjk is the set of connection strengths, .e.g., between neuron j in the 
LGN and neuron k in VI. The lateral interactions between VI neurons may 
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also be included, in which case we have 

ai = J2 h ki w jk Vj (3) 

with h k i for the lateral connection between VI neurons k and I. That is, the 
lateral interaction is computed after the forward interaction, i.e., the forward 
and lateral interactions are applied in succession. 

The backward connections are inhibitory and decrease the activity of the 
LGN relay neuron: 

Vi = ~ 2 w ik Ofe (4) 
k 

The backward connections have the same values as the forward connections, 
since they learn from the same inputs. 

An image is convolved with the difference-of-gaussians functions to form 
an input LGN image, which is stored in an array. A VI neuron is chosen 
and its activities are calculated from 2. In some simulations, the lateral 
interaction may then be applied to the neuron's neighbors. Then the effect 
of the neuron's activity on the LGN through the backward connections is 
applied to the LGN activity. This defines one step. The process is repeated 
until each of the VI neurons has been chosen exactly once. This defines one 
epoch. 

2.2 The Error Measure 

In order to analyse the representation of the image in the VI layer, the image 
is reconstructed from the VI activities. Integration of equation 4 yields a 
formula for the LGN activities Uj(t) in terms of the initial activities yj(0) 
and the VI activities. 

Vi(t) = yj{0) -J2 w jk f_ a k (t')dt' (5) 
k Jt '-° 

Now, if one defines a error measure in terms of the LGN activities, 

E = £yj (6) 
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upon inserting the learning rule into the time derivative of E, 

E = (7) 

j 

= - E a i w ji ( 8 ) 

3 

= ~(ai) 2 (9) 

it is seen, then, that the error decreases on every step, and thus goes to 
zero. Therefore the LGN activities must all go to zero, and by eq. 2, so 
must the VI activities. Since the left-hand side of eq. 5 goes to zero, the 
dynamical equations indicate that the time integral of the VI activities are 
the coefficients of the representation of the LGN image with the connection 
strength to the VI neurons as the basis set. 

This can be shown another way, by the use of the psuedoinverse. Define 
y as a vector with components yj running over the index j, similarly a as 
0^ over the components k, and w& as Wjk over the index j. The matrix W 
be Wjk is formed by taking the vectors w& as its rows. Then, for example, 
equation 2 is written 

a = Wy (10) 

That is , afc is the k th component of the product of W with the vector y. Or, 
at is the inner product of w fc with y. With W + , as the psuedoinverse of W, 

W+ = (W T W)~ 1 W T (11) 

then, using a matrix identity, 

oo 

W T W + = a]T W T W(/-aW T W) n (12) 

n=0 

is the projection onto the subspace spanned by the vectors w&. But this 
is exactly the reconstruction described above, since (I — a W T W) y is the 
activity of the LGN layer after one iteration, an interation being composed of 
determining the VI activities by equation 2 and inhibiting the LGN through 
the backward inhibitory connections. Multiplying thr result by W gives 
the VI activities after one epoch, and then by W T gives the reconstruction 
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after one epoch. Taking W out of the sum, the sum of the first iV terms 
is the sum of the VI activities from each of the first N epochs, e.g., the 
"integral" of the VI activities. Thus the reconstruction method used herein 
is simply the projection of the input image onto the subspace spanned by 
the weight vectors. The above also shows that the network presented cannot 
be improved on for finding the correct coefficients for minimizing the mean 
squared error, since the pseudoinverse does just that. 

3 Characteristics of the Image Representa- 
tions Produced by the Network 

The visual system must contend with a number of constraints in its task of 
representing images. Each VI neuron has only a limited range of features 
that it can respond to, and so any representation must be built up from a 
large number of very simple descriptions. Metabolic energy of the organism 
is limited, requiring that the total activity of neurons be minimized. Speed 
of computation is essential in relating to a changing environment. Finally, 
internal noise generated by the image processing system should not interfere 
with the image representation. 

Let us examine, then, the various physical limitations of the particular 
processing elements making up the vision system, and determine whether 
and how the proposed model addresses the requirements they introduce. 

3.1 The Lena Image 

The prime requirement is that the network represent images faithfully. We 
apply the network to a portrait in order to see if the reconstruction resembles 
the input image. The input image is considered to be the initial activities of 
the LGN neurons, the convolution of the original input image. 

The "lena" image is shown in figure 1 a. This is the image considered 
to be input on the retina. The image is convolved through the difference-of- 
gaussians function which imitates the activity of the retinal ganglion cells. 
The result forms the input to the network and is shown in figure 1 b. 

After one epoch of processing, the reconstructed image begins to resemble 
the input image, fig. 1 c. After twenty epochs, the reconstruction approxi- 
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mates the original, fig. 1 d. This shows that the information from the input 
image has been is well-represented by the integrals of the VI activities. 

3.2 The Effect of Lateral Interactions 

When lateral interactions between VI neurons are included, the results after 
one and after twenty epochs are shown in figs. 2 a and 2 b respectively. 
We find that the lateral interactions do improve the image quality slightly. 
The reader is reminded that the lateral interactions were included in the 
original simulations of the development of the feature map in order to produce 
the spatial features of the map itself, not for the sake of image processing. 
There may be additional functions of the lateral interactions in the realm of 
hyperacuity which are not apparent in the image presented in this example. 
Later, we test the contribution of the lateral interactions on the phenomenon 
of sharpening of orientation tuning with a simple edge image. 

3.3 Reconstruction Without Inhibitory Feedback 

When the processing is done without inhibitory feedback (and no lateral 
interactions), only one epoch need be calculated, since the signal is feedfor- 
ward only. The reconstructed image is shown in figure 2 c. The image is not 
only inferior to the image after twenty epochs with feedback, it is slightly 
worse than the processing with feedback after one epoch. This shows that 
with the reconstruction method that was used, the feedback scheme proposed 
substantially improves the representation of the image. 

3.4 The Activities in VI 

The VI activities after twenty epochs, with inhibitory feedback and without 
lateral interactions, are shown in figure 2 d. The absolute values are shown 
since the sign of any weight vector in the network is arbitrary. This image is 
difficult to interpret directly, hence the use of the reconstruction technique to 
analyse the representation. Though the brain probably does not reconstruct 
the image as we have done, the reconstruction does show that the image 
information is indeed in the VI activities and could be accessed by higher 
stages of processing in the brain. 
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The notable feature of the image of VI activities if the presence of con- 
tinuous lines of low activity. It is understandable that they are there: at 
a singular point there are edge-detecting weight vectors of all orientations; 
therefore at least one of the neurons will have zero activity. Since away from 
the singular points the feature map is continuous and the image is also con- 
tinuous, one would expect continuity in the lines of constant activity value 
in the VI activity image. Hence the lines connect singular points together, 
or they may connect other lines of zero activity. 

In the LGN, such lines correspond to edges in images. We suppose that 
when the neurons supplying information about edges are quiet, other kinds 
of neurons, such as those supplying information about shading or color, can 
win any neural competition that may exist. This corresponds to known 
psychophysical data which implies that such information indeed is taken 
from the border area between image regions. Perhaps there are neurons in 
VI which supply information, such as on textures, in a similar fashion. 

3.5 Walsh Patterns 

Walsh patterns are images consisting of rectangular tilings of black and white 
areas. The spatial frequency of the tiling may be varied in both the hori- 
zontal and vertical directions. Walsh patterns were originally used as a set 
of basis elements for image representation. They are used here for the sake 
of comparison with neuroscience experiments. As simple figures with well- 
defined edges, they are also appropriate for evaluating the performance of 
the image processing network. 

Walsh pattern (1,1) is a tiling of frequency 1 in both directions. The 
original image, its convolution through the difference-of-gaussians function, 
and the reconstructed images at one epoch and twenty epochs are shown 
in figure 3. In the reconstruction after one epoch, only a few VI neurons, 
those almost directly over the edges of the pattern, are contributing to the 
reconstruction. The size and shape of the VI receptive fields are visible 
because of the separation of the neurons involved. At twenty epochs, the 
representation in VI is much more distributed, and the neural activities are 
combining in such a way as to form edges, even where there are no single 
neurons with edges at those locations. Thus the reconstruction is much more 
smooth. 

Walsh patterns (1,2), (2,2), and (4,4) are presented in figures 4, 5, and 
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6. The center edge in figure 4 d is sharper because there is greater contrast 
in the original image there. Figure 6 c shows an interesting effect which is 
confirmed in experimental results in a later section. The reconstructed image 
after one epoch is closer to the original compared to the results for simpler 
images, because the receptive field size and shape happens to be close to 
the size and shape of the image features to be represented. Therefore the 
image is well-represented with comparatively less processing. As shown later, 
this results in a decrease in LGN and VI activities that is more rapid than 
expected, and is seen experimentally as well. 



3.6 Improvements in the Signal to Noise Ratio 

Noise is pervasive in the vision processing system. Neurons may fire sponta- 
neously, or there may be voltage fluctuations along afferent fibers, random 
release of neurotransmitter, and random voltage fluctuations in the receiving 
neuron. Thus, some immunity to noise must be built into the system. 

The summation effect of multiple LGN neurons connecting to each VI 
neuron has the property of reducing noise through averaging. The noise 
considered here would be mixed into the signal along the geniculate axon. 
The excitation of a VI neuron is given by 

a k = XX ^ + u ok) w jk (13) 

3 

where i/j k is a random variable of flat distribution between -0.1 and 0.1 for 
every j and k. Let I 2 = J2j and let v 2 be the variance of the noise distri- 
bution. The squared length of the LGN image divided by the squared length 
of the noise variance yields a signal to noise ratio of 0.21 for the input. The 
noise distribution is superimposed on the LGN values in figure 7 a to provide 
a qualitative illustration of the noise level. 

The expected factor k by which the signal to noise ratio should increase 
from the LGN to VI is given by: 



K 



l [Ej \w jk \] 2 

AT V 2 



where N is the number of VI neurons and Wjk is the connection strength 
from neuron j of the LGN to neuron k of VI. This equation results from the 
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assumption that the activities in VI are mainly determined by edges in the 
input image. It was found that k = 12.01, slightly less than the value of 16 
one would expect for a sum over 16 inputs, because the weights of the inputs 
are not all identical. 

For the feedback connections 

Alj = J2 a kU>jk (15) 

k 

there is again an averaging effect from the summation. By a similar calcula- 
tion to that for the forward connections, the expected increase in the signal 
to noise ratio for the backward connections is found to be a factor of 10.66. 
For the reconstruction of the image from the integral of VI activities 

r econ = Y jAkWjk (1 6 ) 

k 

where Ak is the time integral of the activity ak, there is again an averaging 
effect from the summation. By a similar calculation to that for the forward 
connections, the expected increase in the signal to noise ratio for the back- 
ward connections is found to be a factor of 10.66. Then together, the forward 
and backward connections yield a factor of 128.03. 

A test of the increase in a simulation found a factor relatively close to 
the predicted value. Twenty feedforward-feedback iterations were applied, 
with the noise added on the first iteration. The signal to noise ratio of the 
resulting reconstructed image was 28.39, a factor of 135.2 over the input 
signal to noise ratio of 0.21. 

When the noise is applied on every iteration, the input signal to noise 
ratio is effectively much higher, because the variance of the total noise added 
is increased by a factor of twenty. One would predict an output signal to 
noise ratio of 1.35 in this case. A simulation, whose results are shown in 
figure 7 b, obtained 1.21 for this value. 

The signal-to-noise results are summarized below. 



Image 


Computed S/N 


Measured S/N 


Lena, noise on 1st iteration 


26.89 


28.39 


Lena, noise on all iterations 


1.35 


1.21 



Table 1 Signal-to-noise ratios for the lena image. 
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3.7 Entropy Reduction 

The information carrying capacity of the optic nerve is limited, as is that of 
the connections between the LGN and VI, and therefore we expect signal 
compression to be one of the features of the system. One of the requirements 
of any information processing system is to utilize the minimum amount of 
bandwidth to perform its task. Here we will show that the "bandwidth" used 
by the network for representing the image, defined as the entropy as derived 
from the activity histograms of the neural layers, decreases as the processing 
proceeds from one neural area to the next. This indicates that the image 
is represented more efficiently in the higher layers, as a smaller variety of 
output activities is needed for the representation. 

Figure 8 a shows the histogram of activities for the original, unconvolved 
lena image used in the simulations. The horizontal axis goes from to 255 
and indicates that the image intensities were encoded into eight-bit numbers. 
The entropy of the image may be calculated from the histogram. Recall the 
definition of informational entropy: 

/ = -5>log 2 (pi) (17) 

i 

where Pi is the probability of finding a neuron at activity i, obtained directly 
from the histogram of the activities. If the image were completely random, 
the entropy would be exactly 8. 

Figures 8 b and 8 c show the histograms for the initial LGN activities and 
the integral of the VI activities after twenty epochs. The lateral interactions 
were not included in the simulation. There is a dramatic narrowing of the 
histogram between the original image and the convolved image input to the 
LGN, and a slight narrowing from the LGN to VI. With the lateral interac- 
tions included, the narrowing from the LGN to VI is increased, as shown in 
8 d. 
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Layer 


Entropy 


Original Image 


7.89 


Convolved Image 


6.24 


VI Representation 


6.00 


no lateral interactions 




VI Representation 


5.79 


with lateral interactions 





Table 2 The entropy values for successive stages of processing for the lena 

image. 

The decreases in entropy may be explained in terms of the statistics 
of the image being represented and the transformations performed by the 
network. The convolution is based on the difference-of-gaussians function, 
which reduces any broad area of constant intensity to near zero. Since the 
original input image had these kinds of broad areas, the variations in the 
respective intensities were removed from the image. Likewise, the remaining 
edges in the LGN image could be detected by VI neurons whose receptive 
fields resembled the edges. The activity of a single VI neuron could then 
represent the activities of a number of LGN neurons whose activities formed 
an edge feature. This causes a reduction in image entropy between the LGN 
and layer VI. The entropy reduction is not more than it is here because image 
features which are not similar to edges must be represented by relatively 
complex combinations of edge detectors, which require a greater variety of 
VI activities. If the network contained receptive fields of greater complexity, 
we would expect larger reductions in entropy. 

3.8 Image Compression 

The decrease in entropy described above allows the brain to process images 
more efficiently, either by decreasing the number of neurons involved in the 
processing, or by decreasing the required dynamic range of each neuron. 
Since this gain in efficiency came without any special procedures being added 
to the network, it is of interest to see whether further gains in efficiency 
may be possible with some alterations to the processing. One is in this case 
seeking not necessarily the natural way of image processing used in the brain, 
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but the most efficient way that can be found using the network described. 
Nevertheless, it is possible that the algorithms employed may have correlates 
in the brain's functioning. 

As before, the quality of the representation will be analysed from the 
images reconstructed from the VI activities. One imagines that one wishes 
to send an image, sending it in terms of the integral of the VI activities. The 
recipient has knowledge of the weight vectors and understands that the image 
must be reconstructed with the weight vectors as a basis set and the integral 
of the VI activities as the coefficients. We wish to see if the feature map 
representation is comparable in performance to other image representation 
schemes. 

3.8.1 Decreasing the Number of Channels 

Here, a "channel" is one VI neuron. It is intuitive that if one is able to repre- 
sent an image using fewer neurons, then one has achieved some compression. 
An obvious approach here is to send only the highest-valued VI activities 
to represent the image. However, this does not work very well. Besides the 
requirement of sending the coordinates of the most active neurons, since they 
change from image to image, the reconstructed image is not good. A cutoff 
point was chosen below which the coefficient was not included in the recon- 
struction summation to give a compression ratio of only 17 (the highest one 
seventeenth coefficients were included), yet the reconstruction was poor. 

3.8.2 Decreasing the Dynamic Range 

The dynamic range is the neural equivalent to the number of bits per pixel 
in ordinary image processing. We imagine that the neuron's output is within 
some finite range, but its value can only be known to a certain accuracy. 
Therefore, the larger the dynamic range, the more possible values of output 
that can be discerned. This is comparable to the bits per pixel measure, e.g. 
eights bits per pixel allows 256 shades to be discerned, whereas four bits per 
pixel only allows sixteen. 

Image compression can be achieved by decreasing the bits per pixel, since 
the size of the representation in bits is the number of channels times the 
number of bits per pixel, or channel. Therefore decreasing the dynamic 
range is a form of image compression. In the brain, the dynamic range is 
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limited by the maximum spike rate of a neuron, and gains in terms of this 
quantity allow more information to be processed within the limitation of the 
maximum possible spike rate. 

This compression method was tested by converting the integral of the 
activity of each VI neuron into an eight-bit number. Truncation of the num- 
ber by one, two, three, or more bits results in compression factors of two, 
four, eight, etc. The eight-bit representation was indistinguishable from the 
representation using floating point numbers. This is not surprising, since the 
original image was an eight-bit grayscale image. It was found that compres- 
sion factors of up to sixteen still produced very good reconstructions(figure 9 
a), and a compression factor of thirty-two was still acceptable for the simple 
portrait used(figure 9 b). The iteration of the feedforward-feedback loops 
seemed to be more important in these cases; the reconstruction converged 
slowly to the original image as the number of epochs increased. 

3.8.3 Sparse Channels 

Further compression can be obtained by selecting only one-fourth of the VI 
neurons for the reconstruction; every other row and column are excluded. 
This is combined with the dynamic range reduction to yield a compression 
ratio of sixty-four to one. The reconstructed images have good quality. See 
figure 9 c. 

3.8.4 The Three Methods Combined 

The method of choosing the highest- valued pixels can now be added to these 
two methods. This last method produces a further compression factor of 
3.23, for a total compression of 207. The image is of lower quality, but is still 
recognizable. This is good performance for such a high compression ratio. 
In fact, when one takes into account the fact that the reconstructed image 
is 272 x 272 while the VI array is 256 x 256 in dimension, a further factor 
of 1.15 must by applied to all the compression ratios, yielding compressions 
of 18.4, 73.6, and 237 for the three methods. The results are summarized 
below. 
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Method 


Compression Ratio 


Compression Ratio 
including array size difference 


I. Highest Values of VI Integrals 


17 


19.6 


II. Limit Dynamic Range 


16 


18.4 


III. Use Sparse VI Array 


4 


4.6 


IV. Methods II and III 


64 


73.6 


V. Methods I, II, and III 


207 


237 



Table 3 Compression ratios obtained by various methods for the lena 

image. 



3.9 Sharpening of Orientational Tuning 

The inhibitory part of the lateral interactions between VI neurons creates 
competition between them, for a highly active neuron will suppress the ac- 
tivities of its neighbors. One might expect under this situation that when 
an edge is presented to the network, the neurons whose receptive fields are 
best aligned with the edge will win the competition. How does this affect 
orientational tuning? Without the lateral interaction, a neuron may respond 
somewhat to an edge that is in its receptive field but not at optimal ori- 
entation. With the lateral interaction, such activity will be suppressed by 
inhibition from neighbors whose receptive fields are more closely aligned with 
the edge. Only when the orientation of the edge is close to the optimum for 
the neuron will it retain appreciable activity. 

One hundred twenty-eight edges of various orientations were constructed 
and convolved through the difference-of-gaussians function. One of the edges 
and its convolution are shown in figures 10 a and 10 b. The range of orien- 
tations was one-half revolution, since the sign of the VI activities is ignored. 
The center of rotation was at position (128,128) in VI. The activity of neu- 
ron at position (128,112) in VI was monitored during the simulations. It 
was found that without the lateral interactions, the neuron exhibited broad 
orientational tuning, as shown in figure 11. With the lateral interactions, 
there is substantial decrease in the width of the orientational tuning curve, 
as expected. See figure 12. 

Others have also proposed mechanisms for the enhancement of orienta- 
tional tuning. See, for example, ref. [10]. 
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4 Temporal Aspects of the Image Processing 



The vision system must be responsive to changes in the input, since the 
organism exists in a dynamic environment. We shall therefore investigate 
the rate at which the image representation is formed. Since experimental 
data is often in the form of time series of neural activities, we derive time 
series results from our simulations. 

4.1 Time Series of VI Neural Activities 

We expect the neural activities in both VI and the LGN to be transient, for 
as the inhibitory feedback is applied to the LGN, its activities are damped. 
When the image has been completely represented by the integral of the VI 
activities, the LGN activities have become zero. Then the VI activities must 
also be zero, since they depend on feedforward activation from the LGN. 
Since the VI activities are assumed to begin at zero, we expect a burst of 
activity followed by a slower dropoff. This is verified in simulations with 
walsh patterns, as shown in figure 13. 

One can compare these results with similar ones found experimentally for 
monkeys by Optican and Richmond, et al [11, 12], figure 13. These results 
also show a transient burst of activity. We propose that the transience is due 
at least in part to inhibitory feedback from VI to the LGN. 

Note that the more complex pattern has a faster falloff in VI activity. The 
same result is seen in the simulation. The walsh pattern (4,4) had a faster 
transient than for pattern (1,1). This was discussed previously in section 3.5, 
as figure c shows that the image components happen to match the scale of the 
VI receptive fields, facilitating the representation and hence the effectiveness 
of the inhibition. Since the receptive field sizes were carefully scaled in the 
simulation that created the network to correspond to experimentally-reported 
sizes, the experimental and simulation results may have the same cause. 

4.2 Time Evolution of the Representation Error 

Since the activity of the LGN after any time step is the difference between 
the reconstruction and the original image, recall that we took the squared 
magnitude of the LGN activities as an error measure for the image processing 
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The error always decreases, and the larger the average magnitude of VI ac- 
tivities (the larger the signal to VI), the more rapid the decrease in the error 
of the representation. Hence the use of the eigenvectors of the covariance 
matrix of the inputs, which maximized the signal. 

We would like to compare the measured decrease in the error to the value 
predicted in equation 7. To do this, we will first measure actual decreases 
in the error vs time from simulations, then we will find average values of 
of from statistical analyses of the images. Since E is not constant in time, 
we will choose a particular point in time for the measurements, namely, the 
beginning of the simulation. 

The tests were done on three images: the lena image, the walsh pattern 
(1,2) image, and a random image(random noise of flat distribution in the 
retina, convolved as usual through the difference-of-gaussians to produce the 
input to the LGN). A factor of 0.1 was inserted into equation 4 to make the 
change in the error more gradual so that the approximation of the change in 
the error as a derivative in eq. 7 would be valid. This factor was compensated 
for in subsequent calculations. After each time step, the squared magnitude 
of the LGN activities was computed. Recall that the activity values are 
updated by the choosing of one VI neuron at a time and that this defines one 
time step. The resulting plots of error vs time are shown in figure 14. Figure 
14 b shows the long time result for the lena image. Note that the curve is 
slower than exponential, showing that at later times the proportion of (af ) to 
E decreases. As the processing progresses, the image remaining in the LGN 
contains less components that are matched to the edge detectors, because 
these components have been subtracted from the image by the inhibition. 
The remaining components are of higher order than the simple edge detectors 
comprising the weight vectors and do not cause as high values of (af) in 
proportion to (If). 

4.3 Average Values for Neural Activities 

The value of (E) is predicted by the dynamical equations to be — (a\). For 
this comparison, the values of were calculated by applying the weight 
vectors to the input LGN image. The image was not altered by any backward 
inhibition, since the time period of interest is the beginning of the simulation, 
when the effect of the overall LGN image is negligible. Means and variances 
for (af), (/:•), (J2j w "jk)i an d were computed in this way from 65, 65,536 
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samples. The data, and comparisons to the predicted values, are summarized 
in the tables presented below. 



Lena 


Mean 


Variance 


VI activity squared 


9.89e-04 


6.20e-06 


LGN activity squared 


9.26e-04 


3.34e-06 


Eigenvalue 


2.28e-03 


3.16e-05 




Walsh Pattern (1,2) 


Mean 


Variance 


VI activity squared 


2.93e-02 


3.29e-02 


LGN activity squared 


2.10e-02 


9.65e-03 


Eigenvalue 


6.63e-02 


1.62e-01 




Random Image 


Mean 


Variance 


VI activity squared 


1.19e-05 


3.02e-10 


LGN activity squared 


3.82e-05 


2.53e-09 


Eigenvalue 


2.75e-05 


1.56e-09 



Tables 4,5,6 Means and variance of network quantities for various images. 
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Image 


(E) 


-(«!) 


Ratio 


Lena 


-9.33e-04 


9.49e-04 


1.017 


Walsh Pattern (1,2) 


-2.52e-02 


2.93e-02 


1.16 


Random Image 


-1.15e-05 


1.12e-05 


0.974 



Table 7 Comparisons of error slopes measured from dynamical simulations 
and predicted from statistical averages of VI activities for various images. 

The average rate of change of the error at the beginning of the dynamics 
simulation as predicted from statistical measures of the VI activity is close 
to the measured rate, except for the walsh pattern. Being as simple as it 
is, the walsh pattern would certainly have the most extreme statistics of the 
three images. One sees from the plot of error vs. time for the walsh pattern 
(figure 14 c) that the time evolution of the error is much more uneven here 
than in the other two cases. A VI activity, and therefore a change in the 
error is either high or low for this image, because it is composed only of 
edges or blank spaces. The approximation for the change in the error as a 
derivative in 7 may not be valid for such changes. 

4.4 The Role of the Eigenvalues of the Covariance Ma- 
trix 

Let us now examine how to predict the behavior of the network from just 
the knowledge of the eigenvalues of the covariance matrix of the inputs, 
since this allows one to make contact with various computational schemes 
involving eigenvalues. Define a "weight vector" Wk as a vector composed of 
values running over the j index of Wjk- The receptive field of neuron k in VI 
as defined by the set Wjk may be interpreted as a weight vector Wk- With the 
learning algorithm presented, the weight vectors develop into eigenvectors of 
the covariance matrix. Consider the rate of decrease of the error 

(e) = (£%•%•} ( 18 ) 

= (-J2yj a i w ik) (is) 
j 

= (-^2 yiyjWikW jk ) (20) 

ij 
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= - Yl (Vi Vj) w ik w jk (21) 

ij 

= R ij w ik w jk (22) 

ij 

= ~J2^k w jk w jk (23) 

j 

= -\k\w k \ 2 (24) 

where Rij = (yi yj) and A& is the eigenvalue of Wk- Therefore the speed of 
the network depends on the magnitude of the eigenvalues. By symmetry, 
the value of A& is in principle independent of k. Eigenvalues were found by 
dividing — {al)(— —E) by \wj k \ 2 for various k. The image used to find the a k 
values was the random image, since that was the image used to develop the 
weights and is expected to have the most uniform statistics. Measurements 
of the weight values of the network show that the mean and variance of A& 
are 2.75 x 10~ 5 and 1.57 x 1(T 9 , respectively. 



4.4.1 Testing for Eigen Properties 

We should now like to verify whether the receptive fields Wk obtained in 
the simulations of cortical development are really the eigenvectors of the 
covariance matrix of the inputs. We cannot construct the covariance matrix 
directly; it is too large, but we can approximate the covariance matrix by 
sequential averaging over a series of inputs. Let Rij be the covariance matrix 
of inputs yi and consider 

(J2yiVj W jk) = J2 R ij W 3k (25) 

j j 

= A fc Wik (26) 

if Wk is an eigenvector of R, where A& is its eigenvalue. Since 

s £ j yiy Wj k = yiak (27) 

3 
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we must determine whether 



(Vi a k ) = \ k w ik (28) 

for some The average is easy to find by applying the known weight values 
to a series of images. A set of weights Wjk was chosen arbitrarily and a series 
of 200 random images was presented. Figure 15 a shows the weight vector 
chosen and 15 b shows the quantity (y^a*,). It is plain to see that the product 
is a scalar multiple of the weight vector, and hence the weight vector is an 
eigenvector of the covariance matrix. The closeness of the product to the 
weight vector was measured as the cosine of the angle between the two, e.g., 
their inner product divided by the square roots of their magnitude. This 
value was found to be 0.949. The eigenvalue was taken as the ratio of their 
norms and was 2.87 x 10 -5 . This is not far from the average value obtained 
by dividing (a 2 .) by lu^ 2 for various k. 

4.4.2 Comparing the Dynamics Results to the Eigenvalue Predic- 
tion 

The mean and variance of \wk\ 2 were measured directly and found to be 
4.33 x 10" 1 and 3.71 x 10" 3 , respectively. Thus the value of (E) may be 
predicted from the known eigenvalues [13]. The table below compares the 
predicted and measured values. 



Image 


10(E) 


(Afcl^fcl 2 ) 


Ratio 


Lena 


-9.33e-04 


9.37e-04 


1.004 


Walsh Pattern (1,2) 


-2.52e-02 


2.85e-02 


1.13 


Random Image 


-1.15e-06 


1.12e-05 


0.974 



Table 8 Comparisons of error slopes measured from dynamical simulations 
and predicted from eigenvalues for various images. 

The predictions again close to the actual values. 
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5 The Fokker-Planck Equation for the Error 
Value of the Image Processing Network 

To construct a Fokker-Planck equation for the error, E, we must find its first 
and second moments while the network is processing the image. The first 
moment is clearly — (af.). For the second moment, we have 



((AE) 2 } _ ((E(t + At)-E(t)y 



(29) 



At At 

= 2(E(t)(E(t + At) -E(t))) (30) 

r t+At 

= 2(E(t) J dt'E(t')) (31) 



m)J t 



t+At 

dt'E{t')) (32) 



Since t is arbitrary in the above equation, we can replace E(t) by (E). 

In the dynamics simulations, I for a given time step is chosen randomly 
The dynamics as implemented is therefore a Langevin equation for the motion 
of E: 

E = -a\ t) (33) 

Now, — af, t \ is considered as an impulse function, since time is discrete in the 

simulation. Thus the integral in eq. 29 can be replaced by the value of E at 
some time other than t. This yields 



((AE) 
At 



2\ 



= 2(ajy (34) 



Let a = (af). The Fokker-Planck equation is [14] 

For the short times considered here, we can neglect any dependence of a on 
E. To solve the equation, we note that the probability distribution for the 
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error function is a delta function, since the initial error is known as exactly 
the squared length of the input LGN image. Thus the solution is 



a(E, t) = -±= exp ~( g - f(*)) 2 (36) 

where E(t) is the expectation value of E at time t, according to the initial 
value of E and the expectation value of E. In the following, we will compare 
the results of the dynamical simulations to the prediction of the Fokker- 
Planck equation. 



5.1 Variance Measurements from the Dynamics Sim- 
ulations 

The slopes were measured from the plots of E vs t in figure 14 to find the 
expectation value of E at t — 0. The values are presented in the table 
below. Because the order of VI neurons chosen is random, there will be 
some variance in the value of the error at a given time. To find the shape of 
the probability function for the error, the first one thousand and twenty-four 
steps of the image processing was repeated one hundred thousand times with 
different choices of the VI neurons. Then a histogram was made from the 
one hundred thousand values of E at t — 1024. The histogram is plotted 
in figure 16 and shows the probability density of E at that time. Note the 
gaussian shape of the histogram, which shows that the form of the Fokker- 
Planck solution is correct. The variances were measured numerically and are 
included in the table below. 

Let us check the solution to the Fokker-Planck equation. The variance is 
given by 

a = 2 a 2 t (37) 

—a is the average slope of E vs t. The values of a obtained from the plots of 
E vs t are used to calculate the expected variance of the error at t — 1024, 
and this is compared to the measured variance in the following table. 
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Image 


(E) 


Computed Variance 


Measured Variance 


Lena 


-9.49e-04 


1.84e-03 


5.14e-03 


Walsh Pattern (1,2) 


-2.93e-02 


1.76e+00 


3.04e+01 


Random Image 


-1.12e-05 


2.56e-07 


2.97e-07 



Table 9 Computed and measured variances for the error of the 
representation at t — 1024. 

For the random image, the actual variance is within reasonable agreement 
to the predicted variance. For the others, the prediction is quite far off. This 
is presumably due to the distribution of af being somewhat bimodal; some 
high values where edges occur in the images, and low values where there are 
no edges. This would no doubt increase the variance of the error over that 
predicted, and indeed one sees a greater error for the walsh function, where 
this bimodality is more pronounced. 

6 Discussion 

The simulations show that the system for representing images proposed has 
several desirable properties in regards to efficiency and robustness to noise. 
The scheme is suitable for compressing images at very high ratios. We also 
found that the time series of VI neuron activities resembles those found in 
experiments. 

Because of the inhibitory feedback, we could define an error as the squared 
length of the remaining activities in the LGN. This is the same as the squared 
length of the difference between the original image and its reconstruction from 
the integral of VI activities. The rate of decrease of the error depends on 
the image being processed. The more an image consists only of sharp edges, 
the more quickly the representation in terms of edge detectors is formed. 
Therefore, the simulations proceeded most rapidly for the walsh pattern, then 
the lena image, and lastly, the random image. Random noise and textures 
are likely to be represented more efficiently in the brain's image processing by 
specialized circuits which perhaps model the textures' spectral or statistical 
properties. 

The solution to the Fokker-Planck equation for the image processing net- 
work was shown to be gaussian, because the state of the error at t — 
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is known precisely. The predicted value of the variance was close to that 
measured for the random-noise image. The variances for other images were 
very far off, probably because the distribution of the random term of the 
corresponding Langevin equation is not gaussian in those cases. 

It is interesting that the image processing system in the brain might be 
close to those employed in engineering fields, such as the wavelet and Gabor 
representations. The pressure for optimization is common to both. The ex- 
istence of the orientational feature map is unique to the biological processing 
system. We believe that the feature map arises because the brain is more 
limited in the complexity of the feature detectors employed than is the engi- 
neered system. Therefore more feature detectors of a simpler type must be 
used, and the feature map is an efficient way of arranging them to cover as 
much as possible of the space of input images while preserving local continu- 
ity. The latter is crucial for the efficiency of the network described here, for 
the inhibitory feedback will most effectively reduce neural metabolic expen- 
diture when overlapping receptive fields have nearly the same orientation. 
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(a) (b) 




(c) (d) 

Figure 1: a) Original lena image, b) Original lena image to LGN after filtering 
through retina, c) Reconstructed image after 1 epoch d) Reconstructed image 
after 20 epochs. 
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Figure 2: a) Reconstructed image after 1 epoch with lateral interaction, 
b) Reconstructed image after 20 epochs with lateral interaction, c) Recon- 
structed image after 1 epochs with no inhibitory feedback, d) Absolute val- 
ues of activities in VI after twenty steps, constrast enhanced, with inhibitory 
feedback and no lateral interactions. 
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(c) (d) 

Figure 3: a) Original walsh pattern (1,1) image, b) Original LGN input for 
walsh image, c) Reconstruction of walsh image after 1 epoch, d) Reconstruc- 
tion of walsh image after 20 epochs. 



31 



I 






(c) 



(d) 



Figure 4: a) Original walsh pattern (1,2) image, b) Original LGN input for 
walsh image, c) Reconstruction of walsh image after 1 epoch, d) Reconstruc- 
tion of walsh image after 20 epochs. 
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(a) (b) 



(c) (d) 

Figure 5: a) Original walsh pattern (2,2) image, b) Original LGN input for 
walsh image, c) Reconstruction of walsh image after 1 epoch, d) Reconstruc- 
tion of walsh image after 20 epochs. 
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(a) 



(b) 




Figure 6: a) Original walsh pattern (4,4) image, b) Original LGN input for 
walsh image, c) Reconstruction of walsh image after 1 epoch, d) Reconstruc- 
tion of walsh image after 20 epochs. 
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(a) 



(b) 



Figure 7: a) The LGN image plus noise. To each pixel value has been added 
noise from a flat distribution between -0.1 and 0.1, yielding a signal-to-noise 
ratio of 0.210. b) Reduction in noise level. The image from a) reconstructed 
from the VI neural activities. The signal to noise ratio is 1.42. 
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(c) (d) 

Figure 8: a) Histogram of activities for original lena image as shown in la. 
b) Histogram of input LGN activities for lena image as shown in lb. c) 
Histogram for integral of activities of VI neurons, for lena image. No lateral 
interaction, d) Histogram for integral of activities of VI neurons, for lena 
image, lateral interaction included. 
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(a) 



(b) 




(c) (d) 

Figure 9: a) Image compressed in dynamic range by a factor of 16. Reconstruction after 
20 epochs, b) Image compressed by decreasing dynamic range of Integral of VI activity 
by a factor of 32. Reconstruction after 20 iterations c) Image compressed by a factor of 
16 in dynamic range and by a factor of four in number of VI neurons. Total compression 
64, or 73.3, taking into account difference in image size and VI size. Reconstruction after 
20 iterations, d) Image compressed by a factor of 16 in dynamic range, by a factor of 
four in number of VI neurons, and by a factor of 3.23 in taking only highest valued VI 
neurons. Total compression 206.7, or 236.8, taking into account difference in image size 
and VI size. Reconstruction after 20 iterations. 
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(a) (b) 

Figure 10: a) An original edge image. Surrounding box added for clarity, b) 
An edge image after convolution. 128 edges were used, of various orientations 
ranging over one-half revolution. The center of rotation was the center of the 
image. 
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Figure 11: The sharpening of orientation tuning in VI is demonstrated by a 
comparison of normalized orientation tuning curves with and without lateral 
interactions bewteen VI neurons. The figure shows the absolute value of the 
response of a VI neuron to a stationary edge of various orientations when 
no lateral interactions are applied. Figure 12 shows the response when the 
lateral interactions are iterated sixteen times for each orientation. The range 
of the horizontal axis is one-half of a revolution. 
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Figure 12: The response of the VI neuron of figure 11 to a stationary 
edge of various orientations when the lateral interactions are iterated sixteen 
times for each orientation. A sharpening of the orientational tuning is clearly 
visible. The range of the horizontal axis is one-half revolution. 
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Figure 13: Time series of VI neuron activity for walsh pattern input. Top: 
Results from simulations. A: Walsh pattern (1,1), B: Walsh pattern (4,4), 
surrounding box added for clarity, gray background illustrated as white. Mid- 
dle, times series smoothed by gaussian filtering of width two. The curves 
immediately rise upon presentation of an image, then fall as the inhibitory 
feedback cancels the image in the LGN. See figures 3 and 6 for walsh 
patterns. Bottom: Results and patterns from monkey experimentsfll, 12]. 
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(c) (d) 

Figure 14: Initial segment of error vs time, defined as total squared mag- 
nitude of LGN activity, for the images a) lena b) lena (long time) c) walsh 
pattern (1,2) d) random 
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(a) (b) 

Figure 15: a) Weight vector used to check for eigen property, b) Average 
of product of the weight vector and lgn activities, random inputs. The re- 
semblance to the original weight vector shows that the weight vector is an 
eigenvector. 
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Figure 16: The histogram of error values at t — 1024 for one hundred thou- 
sand runs. The simulation was done on the random image. Note the gaussian 
shape of the curve, showing that the solution to the Fokker-Planck equation 
is of the correct form. 
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